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Abstract 

We propose a nonparametric method for estimating the pricing formula of a derivative asset using learning 
networks. Although not a substitute for the more traditional arbitrage-based pricing formulas, network 
pricing formulas may be more accurate and computationally more efficient alternatives when the un¬ 
derlying asset’s price dynamics are unknown, or when the pricing equation associated with no-arbitrage 
condition cannot be solved analytically. To assess the potential value of network pricing formulas, we sim¬ 
ulate Black-Scholes option prices and show that learning networks can recover the Black-Scholes formula 
from a two-year training set of daily options prices, and that the resulting network formula can be used 
successfully to both price and delta-hedge options out-of-sample. For comparison, we estimate models 
using four popular methods: ordinary least squares, radial basis function networks, multilayer perceptron 
networks, and projection pursuit. To illustrate the practical relevance of our network pricing approach, 
we apply it to the pricing and delta-hedging of S&P 500 futures options from 1987 to 1991. 
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1 Introduction 

Much of the success and growth of the market for op¬ 
tions and other derivative securities may be traced to 
the seminal papers by Black and Scholes (1973) and Mer¬ 
ton (1973), in which closed-form option pricing formulas 
were obtained through a dynamic hedging argument and 
a no-arbitrage condition. The celebrated Black-Scholes 
and Merton pricing formulas have now been generalized, 
extended, and applied to such a vast array of securities 
and contexts that it is virtually impossible to provide an 
exhaustive catalog. Moreover, while closed-form expres¬ 
sions are not available in many of these generalizations 
and extensions, pricing formulas may still be obtained 
numerically. 

In each case, the derivation of the pricing formula 
via the hedging/no-arbitrage approach, either analyti¬ 
cally or numerically, depends intimately on the partic¬ 
ular parametric form of the underlying asset’s price dy¬ 
namics S(t). A misspecihcation of the stochastic process 
for S(t) will lead to systematic pricing and hedging er¬ 
rors for derivative securities linked to S(t). Therefore, 
the success or failure of the traditional approach to pric¬ 
ing and hedging derivative securities, which we call a 
parametric pricing method, is closely tied to the ability 
to capture the dynamics of the underlying asset’s price 
process. 

In this paper, we propose an alternative data-driven 
method for pricing and hedging derivative securities, a 
nonparametric pricing method, in which the data is al¬ 
lowed to determine both the dynamics of S(t) and its re¬ 
lation to the prices of derivative securities with minimal 
assumptions on S(t) and the derivative pricing model. 
We take as inputs the primary economic variables that 
influence the derivative’s price, e.g., current fundamen¬ 
tal asset price, strike price, time-to-maturity, etc., and 
define the derivative price to be the output into which 
the learning network maps the inputs. When properly 
trained, the network “becomes” the derivative pricing 
formula which may be used in the same way that for¬ 
mulas obtained from the parametric pricing method are 
used: for pricing, delta-hedging, simulation exercises, 
etc. 

These network-based models have several important 
advantages over the more traditional parametric models. 
First, since they do not rely on restrictive parametric as¬ 
sumptions such as lognormality or sample-path continu¬ 
ity, they are robust to the specification errors that plague 
parametric models. Second, they are adaptive, and re¬ 
spond to structural changes in the data-generating pro¬ 
cesses in ways that parametric models cannot. Finally, 
they are flexible enough to encompass a wide range of 
derivative securities and fundamental asset price dynam¬ 
ics, yet relatively simple to implement. 

Of course, all these advantages do not come with¬ 
out some cost—the nonparametric pricing method is 
highly data-intensive, requiring large quantities of histor¬ 


ical prices to obtain a sufficiently well-trained network. 
Therefore, such an approach would be inappropriate 
for thinly-traded derivatives, or newly-created deriva¬ 
tives that have no similar counterparts among existing 
securities. 1 Also, if the fundamental asset’s price dynam¬ 
ics are well-understood and an analytical expression for 
the derivative’s price is available under these dynamics, 
then the parametric formula will almost always dominate 
the network formula in pricing and hedging accuracy. 
Nevertheless, these conditions occur rarely enough that 
there may still be great practical value in constructing 
derivative pricing formulas by learning networks. 

In Section 2, we provide a brief review of learning 
networks and related statistical methods. To illustrate 
the promise of learning networks in derivative pricing 
applications, in Section 3 we report the results of sev¬ 
eral Monte Carlo simulation experiments in which ra¬ 
dial basis function (RBF) networks “discover” the Black- 
Scholes formula when trained on Black-Scholes call op¬ 
tion prices. Moreover, the RBF network pricing formula 
performs as well as the Black-Scholes formula in delta¬ 
hedging a hypothetical option, and in some cases per¬ 
forms even better [because of the discreteness-error in 
the Black-Scholes case arising from delta-hedging daily 
instead of continuously]. To gauge the practical rel¬ 
evance of our nonparametric pricing method, in Sec¬ 
tion 4 we apply the RBF pricing model to daily call 
option prices on S&P 500 futures from 1987 to 1991 
and compare its pricing and delta-hedging performance 
to the naive Black-Scholes model. We find that in 
many cases, the network pricing formula outperforms the 
Black-Scholes model. We suggest several directions for 
future research and conclude in Section 5. 

2 Learning Networks: A Brief Review 

Over the past 15 years, a number of techniques have been 
developed for modeling nonlinear statistical relations 
nonparametrically. In particular, projection pursuit re¬ 
gression, multilayer perceptrons [often called “backprop- 
agation networks” 2 ], and radial basis functions are three 
popular examples of such techniques. Although origi¬ 
nally developed in different contexts for seemingly dif¬ 
ferent purposes, these techniques may all be viewed as 
nonparametric methods for performing nonlinear regres¬ 
sions. Following Barron and Barron (1988) we call this 
general class of methods learning networks to emphasize 
this unifying view and acknowledge their common his- 

1 However, since newly-created derivatives can often be 
replicated by a combination of existing derivatives, this is 
not as much of a limitation as it may seem at first. 

“More accurately, the term “backpropagation” is now typ¬ 
ically used to refer to the particular gradient descent method 
of estimating parameters, while the term “multilayer percep- 
tron” is used to refer to the specific functional form described 
below. 



tory. In the following sections, we shall provide a brief 
review of their specification and properties. Readers al¬ 
ready familiar with these techniques may wish to proceed 
immediately to the Monte Carlo simulation experiments 
of Section 3. 

2.1 Standard Formulations 

In this section we describe the standard formulations of 
the learning networks to be used in this paper. For ex- 
positional simplicity, we shall focus our attention on the 
problem of mapping multiple input variables into a uni¬ 
variate output variable, much like regression analysis, al¬ 
though the multivariate-output case is a straightforward 
extension. 

Given the well-known trade-offs between degrees of 
freedom and approximation error in general statistical 
inference, we shall also consider the number of param¬ 
eters implied by each model so that we can make com¬ 
parisons between them on a roughly equal footing. Note, 
however, that the number of free parameters is a crude 
measure of the complexity of nonlinear models, and more 
refined measures may be available, e.g., the nonlinear 
generalizations of the influence matrix in Wahba (1990). 

A common way to visualize the structure of these net¬ 
works is to draw them as a graph showing the connec¬ 
tions between inputs, nonlinear “hidden” units, and out¬ 
puts [see Figure 1], 

2.1.1 Radial Basis Functions 

Radial Basis Functions (RBFs) were first used to 
solve the interpolation problems—fitting a curve exactly 
through a set of points [see Powell (1987) for a re¬ 
view]. More recently, the RBF formulation has been ex¬ 
tended by several researchers to perform the more gen¬ 
eral task of approximation [see Broomhead and Lowe 
(1988), Moody and Darken (1989) and Poggio and Girosi 
(1990)]. In particular, Poggio and Girosi (1990) show 
how RBFs can be derived from the classical regulariza¬ 
tion problem in which some unknown function y = f(x) 
is to be approximated given a sparse dataset (x t , y t ) and 
some smoothness constraints. In terms of our multiple- 
regression analogy, the d-dimensional vector x t may 
be considered the “independent” or “explanatory” vari¬ 
ables, yt the “dependent” variable, and /(•) the [possi¬ 
bly] nonlinear function that is the conditional expecta¬ 
tion of yt given x t , hence: 

Vt = f(xt) + D , E[e t |* t ] = 0 . (1) 

The regularization [or “nonparametric estimation”] 
problem may then be viewed as the minimization of the 
following objective functional: 

H(f) = £ (^\\yt-f(m 2 + M\Pf(M\ 2 ^J (2) 

where || • || is some vector norm and P is a differential 
operator. The first term of the sum in (2) is simply 


the distance between the approximation /(£)) and the 
observation y t , the second term is a penalty function that 
is a decreasing function of the smoothness of /(•), and A 
controls the trade-off between smoothness and fit. 

In its most general form, and under certain conditions 
[see, for example, Poggio and Girosi (1990)], the solution 
to (2) is given by the following expression: 

k 

/(*) = ^CihiiWx - Zi\\) +p(x) (3) 

8 = 1 

where {£)} are d-dimensional vector prototypes or “cen¬ 
ters”, {cj} are scalar coefficients, {hi} are scalar func¬ 
tions, p(-) is a polynomial, and k is typically much less 
than the number of observations T in the sample. Such 
approximants have been termed “hyperbasis functions” 
by Poggio and Girosi (1990) and are closely related to 
splines, smoothers such as kernel estimators, and other 
nonparametric estimators. 3 

For our current purposes, we shall take the vector 
norm to be a weighted Euclidean norm defined by a (d x 
d ) weighting matrix W, and the polynomial term shall be 
taken to be just the linear and constant terms, yielding 
the following specification for /(•): 

f(x) = y^Cjhj ( (x- Zj)'W'W{x - Zj) ^ 

8 = 1 ' ' 

T ex o T ex-^x (4) 

where exo and aq are the coefficients of the polynomial 
p(-). Micchelli (1986) shows that a large class of basis 
functions hi}-) are appropriate, but the most common 
choices for basis functions h{x) are Gaussians e~ x ! a and 
multiquadrics xj x + it 2 . 

That networks of this type can generate any real¬ 
valued output, but in applications where we have some 
a prion knowledge of the range of the desired outputs, 
it is computationally more efficient to apply some non¬ 
linear transfer function to the outputs to reflect that 
knowledge. This will be the case in our application to 
derivative pricing models, in which some of the RBF 
networks will be augmented with an “output sigmoid”, 
which maps the range (—oo, oo) into the fixed range 
(0,1). In particular, the augmented network will be of 
the form g(f(x)) where g(u) = 1/(1 + e~ u ). 

For a given set of inputs {£)} and outputs {yt}, RBF 
approximation amounts to estimating the parameters of 
the RBF network: the d(d+l)/2 unique entries of the 
matrix W'W, the dk elements of the centers {z)}, and 
the d + k + 1 coefficients exo, ciq, and {c 8 }. Thus the 
total number of parameters that must be estimated for 
d-dimensional inputs and k centers is dk+(d 2 /2)+(3d/2)+ 
k + 1. 

3 To economize on terminology, in this paper we use the 
term “radial basis functions” to encompass both the in¬ 
terpolation techniques used by Powell and its subsequent 
generalizations. 



2.1.2 Multilayer Perceptrons 

Multilayer perceptrons (MLPs) are arguably the most 
popular type of “neural network”, the general category 
of methods that derive their original inspiration from 
simple models of biological nervous systems. They were 
developed independently by Parker (1985) and Rumel- 
hart et al. (1986) and popularized by the latter. Follow¬ 
ing the notation of Section 2.1.1, a general formulation of 
MLPs with univariate outputs may be written as follows: 

f(x) = h SjhiPoi+P^x) + ^ (5) 

where h(-) is typically taken to be a smooth, monotoni- 
cally increasing function such as the “sigmoid” function 
1/(1 + e~ x ), {i/} and [3 are coefficients, and k is the 
number of “hidden units”. The specification (5) is gen¬ 
erally termed an MLP with “one hidden layer” because 
the basic “sigmoid-of-a-dot-product” equation is nested 
once—the nesting may of course be repeated arbitrar¬ 
ily many times, hence the term “multilayer” perceptron. 
Unlike the RBF formulation, the nonlinear function h 
in the MLP formulation is usually fixed for the entire 
network. 

For a given set of inputs {£)} and outputs {t/ t }, fit¬ 
ting an MLP model amounts to estimating the (d+l)k 
parameters {/3oi} and {/Lu}, and the k +1 parameters 
{i/}, for a total of (d+2)k + l parameters. 

2.1.3 Projection Pursuit Regression 

Projection pursuit is a method that emerged from 
the statistics community for analyzing high-dimensional 
datasets by looking at their low-dimensional projections. 
Friedman and Stuetzle (1981) developed a version for 
the nonlinear regression problem called projection pur¬ 
suit regression (PPR). Similar to MLPs, PPR models are 
composed of projections of the data, i.e., dot products 
of the data with estimated coefficients, but unlike MLPs 
they also estimate the nonlinear combining functions 
from the data. Following the notation of Section 2.1.2, 
the formulation for PPR with univariate outputs can be 
written as 

k 

f(x) = '^2s i hi0 , i x) + So (6) 

8 = 1 

where the functions /q-(-) are estimated from the data 
[typically with a smoother], the {6;} and [3 are coeffi¬ 
cients, and k is the number of projections. Note that So 
is commonly taken to be the sample mean of the outputs 
/(*)• 

In counting the number of parameters that PPR mod¬ 
els require, a difficulty arises in how to treat its use of 
smoothers in estimating the inner h functions. A naive 
approach is to count each smoothing estimator as a sin¬ 
gle parameter, its bandwidth. In this case, the total 
number of parameters is dk projection indices, k linear 


coefficients, and k smoothing bandwidths, for a total of 
(d + 2)k parameters. However, a more refined method 
of counting the degrees of freedom, e.g., Wahba (1990), 
may yield a slightly different count. 

2.2 Network Properties 

Although the various learning network techniques origi¬ 
nated from a variety of backgrounds, with implications 
and characteristics that are not yet fully understood, 
some common and well-established properties are worth 
noting. 

2.2.1 Approximation 

All of the above learning networks have been shown to 
possess some form of a universal approximation property. 
For example, Huber (1985) and Jones (1987) prove that 
with sufficiently many terms, any square-integrable func¬ 
tion can be approximated arbitrarily well by PPR. Cy- 
benko (1988) and Hornik (1989) demonstrate that one- 
hidden layer MLPs can represent to arbitrary precision 
most classes of linear and nonlinear continuous functions 
with bounded inputs and outputs. Finally, Poggio and 
Girosi (1990) show that RBFs can approximate arbitrar¬ 
ily well any continuous function on a compact domain. 
In a related vein, Poggio and Girosi also show that RBFs 
have the “best” approximation property—there is always 
a choice for the parameters that is better than any other 
possible choice—a property that is not shared by MLPs. 

2.2.2 Error Convergence 

The universal approximation results, however, say 
nothing about how easy it is to find those good approx¬ 
imations, or how computationally efficient they are. In 
particular, does the number of data points we will need 
to estimate the parameters of a network grow exponen¬ 
tially with its size [the so-called “curse of dimensional¬ 
ity”]? Recent results show that this is not necessarily 
true if we are willing to restrict the complexity of the 
function we wish to model. For example, Barron (1991) 
derives bounds on the rate of convergence of the approxi¬ 
mation error in MLPs based on the number of examples, 
given assumptions about the smoothness of the function 
being approximated. Chen (1991) obtains similar results 
for PPR. Girosi and Anzellotti (1992) derive bounds on 
convergence in RBFs using somewhat more natural as¬ 
sumptions about the smoothness of the function being 
approximated. Niyogi and Girosi (1994) extend this re¬ 
sult for the estimation problem, and derive a bound on 
the “generalization error” of RBFs, the error an RBF 
network will make on unseen data. 

The importance and centrality of generalization error 
bounds to the process of data-driven modeling is worth 
noting. In particular, these bounds show that for a fixed 
number of data points, the generalization error that we 
can expect from a network first decreases as the network 
complexity—number of parameters—increases, then af¬ 
ter a certain point the error increases [see Figure 2]. For 



the financial modeling problems considered in this paper, 
the data set size is, to some extent, fixed and thus these 
results indicate that there will be an optimal number of 
parameters to use for that size of data set. 

Other interesting estimation properties have been in¬ 
vestigated for PPR in particular. Diaconis and Shahsha- 
hani (1984) provide necessary and sufficient conditions 
for functions to be represented exactly using PPR. 
Donoho and Johnstone (1989) demonstrate the duality 
between PPR and kernel regression in two dimensions, 
and show that PPR is more parsimonious for modeling 
functions with angular smoothness. 

2.2.3 Model Specification 

A key question for most approximation techniques 
and in particular for neural network-like schemes con¬ 
cerns the type and the complexity of the model or the 
network to be used for a specific problem. Different ap¬ 
proaches and different network architectures correspond 
to different choices of the space of approximating func¬ 
tions. A specific choice implies a specific assumption 
about the nature of the nonlinear relation to be approx¬ 
imated. For example, Girosi, Jones and Poggio (1993) 
have shown that different assumptions about smooth¬ 
ness of the function to be approximated lead to different 
approximation schemes, such as different types of Radial 
Basis Functions, as well as different kinds of splines and 
of ridge approximators. Certain classes of smoothness 
assumptions in the different variables even lead to mul¬ 
tilayer perceptron architectures. The number of basis 
functions, and more in general of network parameters, is 
a related and difficult issue. Even if one type of archi¬ 
tecture can be chosen based on prior knowledge about 
the smoothness to be expected in the specific problem, 
the question remains about the appropriate complexity 
of the architecture, that is the number of parameters. A 
general answer does not yet exist and is unlikely to be 
discovered any time soon. The standard approach to the 
problem relies on cross-validation techniques and varia¬ 
tions of them [Wahba (1990)]. A related, more funda¬ 
mental approach—called structural risk minimization— 
has been developed by Vapnik (1982). 

2.2.4 Parameter Estimation Methods 

In our discussion above, we have focused primarily on 
the specification of /(•) for each method, but of course 
a critical concern is how each of the model’s parame¬ 
ters are to be estimated. To some extent, the estimation 
issue may be divorced from the specification issue. In¬ 
deed, there is a large body of literature concerned solely 
with the estimation of network parameters. Much of 
this literature shows that the speed and accuracy of the 
estimation process depends on the kind of derivative in¬ 
formation used, whether all parameters are estimated 
simultaneously or sequentially, and whether all the data 
is used at once in a “batch” mode or sequentially in an 


“on-line” mode. In Hutchinson (1993), estimation tech¬ 
niques for RBF networks are more fully explored. 

However, a rigorous comparison of estimation meth¬ 
ods is not the primary goal of our paper; rather, our 
objective is to see if any method can yield useful results. 
As such we have adopted the most common estimation 
schemes for our use of the other types of learning net¬ 
works. In particular we adopt Levenberg-Marquardt for 
batch mode estimation of the RBF networks, gradient 
descent [with momentum] for on-line mode estimation 
of the MLP networks, and the Friedman and Stuetzle 
algorithm for PPR [which uses a Newton method to com¬ 
pute the projection directions and the “supersmoother” 
for finding the nonlinear functions h\. 

Although not pursued here, readers interested in ex¬ 
ploring the trade-offs between on-line and batch-mode 
estimation are encouraged to consult the “stochastic ap¬ 
proximation” literature [see Robbins and Monro (19510, 
Ljung & Soderstrom (1986), and Widrow and Stearns 
(1985)]. In general, it is not known why on-line methods 
used with neural network techniques often seem to per¬ 
form better than batch methods on large-scale, noncon- 
vex problems. It seems difficult to extract any general 
conclusions from the diverse body of literature reporting 
the use of different on-line and batch techniques across 
many disparate applications. 

2.2.5 Equivalence of Different Learning 
Networks 

There is another reason that we do not focus on the 
merits of one type of learning network over another: re¬ 
cent theoretical developments suggest that there are sig¬ 
nificant connections between many of these networks. 
For example, Maruyama, Girosi, and Poggio (1991) show 
an equivalence between MLP networks with normal¬ 
ized inputs and RBF networks. Girosi, Jones and Pog¬ 
gio (1993) prove that a wide class of approximation 
schemes can be derived from regularization theory, in¬ 
cluding RBF networks and some forms of PPR and MLP 
networks. Nevertheless, we expect each formulation to 
be more efficient at approximating some functions than 
others, and as argued by Ng and Lippman (1991), the 
practical differences in using each method, e.g., in run¬ 
ning time or memory used, may be more important than 
model accuracy. 

3 Learning the Black-Scholes Formula 

Given the power and flexibility of learning networks to 
approximate complex nonlinear relations, a natural ap¬ 
plication is to derivative securities whose pricing for¬ 
mulas are highly nonlinear even when they are avail¬ 
able in closed form. In particular, we pose the follow¬ 
ing challenge: if option prices were truly determined by 
the Black-Scholes formula exactly, can learning networks 
“learn” the Black-Scholes formula? In more standard 



statistical jargon: can the Black-Scholes formula be es¬ 
timated nonparametrically via learning networks with a 
sufficient degree of accuracy to be of practical use? 

In this section, we face this challenge by performing 
Monte Carlo simulation experiments in which various 
learning networks are trained on artificially generated 
Black-Scholes option prices, and then compared to the 
Black-Scholes formula both analytically and in out-of- 
sample hedging experiments to see how close they come. 
Even with training sets of only six months of daily data, 
learning network pricing formulas can approximate the 
Black-Scholes formula with remarkable accuracy. 

While the accuracy of the learning network prices 
is obviously of great interest, this alone is not suffi¬ 
cient to ensure the practical relevance of our nonpara- 
metric approach. In particular, the ability to hedge 
an option position is as important, since the very ex¬ 
istence of an arbitrage-based pricing formula is predi¬ 
cated on the ability to replicate the option through a 
dynamic hedging strategy. This additional constraint 
motivates the regularization techniques and, in particu¬ 
lar, the RBF networks used in this study. Specifically, 
delta-hedging strategies require an accurate approxima¬ 
tion of the derivative of the underlying pricing formula, 
and the need for accurate approximations of derivatives 
leads directly to the smoothness constraint imposed by 
regularization techniques such as RBF networks. 4 Of 
course, whether or not the delta-hedging errors are suf¬ 
ficiently small in practice is an empirical matter, and we 
shall investigate these errors explicitly in our simulation 
experiments and empirical application described below. 

However, the accuracy we desire cannot be achieved 
without placing some structure on the function to be 
approximated. For example, we begin by asserting that 
the option pricing formula /(•) is smooth in all its ar¬ 
guments, and that its arguments are: the stock price 
S(t), the strike price X, and the time-to-maturity T—t. 
In fact, we know that the Black-Scholes formula also de¬ 
pends on the risk-free rate of interest r and the volatility 
a of the underlying asset’s continuously-compounded re¬ 
turns, e.g., 

C(i) = 5(<)$(d 1 )-A:e- r ( T - t )$(d 2 ) (7) 

4 In fact, it is well known that the problem of numerical 
differentiation is ill-posed. The classical approach [Rhein- 
sch (1967)] is to regularize it by finding a sufficiently smooth 
function that solves the variational problem in (2). As we dis¬ 
cussed earlier, RBF networks as well as splines and several 
forms of MLP networks follow directly from the regulariza¬ 
tion approach and are therefore expected to approximate not 
only the pricing formula but also its derivatives [provided the 
basis function corresponding to a smoothness prior is of a suf¬ 
ficient degree, see (Poggio and Girosi, 1991): in particular, 
the Gaussian is certainly sufficiently smooth for our problem]. 
A special case of this general argument is the result of Gal¬ 
lant and White (1992) and Hornik, Stinchcombe, and White 
(1990) who show that single-hidden-layer MLP networks can 
approximate the derivative of an arbitrary nonlinear mapping 
arbitrarily well as the number of hidden units increases. 



and $(•) is the standard normal cumulative distribution 
function. However, if r and a are fixed throughout the 
network’s training sample as we shall assume, then the 
dependence of the option’s price on these two quantities 
cannot be identified by any nonparametric estimator of 
/(•) in the way that (7) does. 5 Of course, if interest rates 
and volatility vary through time as they do in practice, 
learning networks can readily capture their impact on 
option prices explicitly. 

One further simplification we employ is to assume that 
the statistical distribution of the underlying asset’s re¬ 
turn is independent of the level of the stock price S(t), 
hence by Theorem 8.9 of Merton (1990, Chapter 8), the 
option pricing formula /(•) is homogeneous of degree 
one in both S(t) and X, so that we need only estimate 
f(S(t)/X,l,T — t). By requiring only two rather than 
three inputs to our learning networks we may be lessen¬ 
ing the number of data points required for learning, but 
it should also be possible to relax these assumptions and 
use all three inputs. 

We can now outline the components of our Monte 
Carlo simulation experiment, which consists of two 
phases: training and testing. The training phase en¬ 
tails generating sample paths of stock and option prices 
on which the learning networks are “trained”, i.e., the 
network parameters are fitted to each sample path so 
as to minimize a quadratic loss function. This yields a 
network pricing formula which is then “tested” on newly- 
simulated sample paths of stock and option prices, i.e., 
various performance measures are calculated for the net¬ 
work pricing formula using the test path. 

To obtain a measure of the success of the “average” 
network pricing formula, we repeat the training phase 
for many independent option/stock price sample paths, 
apply each network formula to the same test path, and 
average the performance measures across training paths. 
To obtain a measure of the “average success” of any 
given network pricing formula, we do the reverse: for 
a single training path, we apply the resulting network 
pricing formula on many independent option/stock price 
test paths, and average the performance measures across 
test paths. 

Since we conduct multiple training-path and test-path 
simulations, our simulation design is best visualized as 
a matrix of results: each row corresponds to a separate 
and independent training path, each column corresponds 
to a separate and independent test path, and each cell 
contains the performance measures for a network trained 


5 This is one sense in which analytical pricing formulas for 
derivative securities are preferred whenever available. 



on a particular training path and applied to a particu¬ 
lar test path. Therefore, the “average success” of a given 
network may be viewed as an average of the performance 
measures across the columns of a given row, and the per¬ 
formance of the “average network” on a given test path 
may be viewed as an average of the performance mea¬ 
sures across the rows of a given column. Although these 
two averages obviously closely related, they do address 
different aspects of the performance of learning networks, 
and the results of each must be interpreted with the ap¬ 
propriate motivation in mind. 

3.1 Calibrating the Simulations 

In the first phase of our Monte Carlo simulation 
experiment—the training phase—we simulate a two-year 
sample of daily stock prices, and create a cross-section 
of options each day according to the rules used by the 
Chicago Board Options Exchange (CBOE) with prices 
given by the Black-Scholes formula. We refer to this 
two-year sample of stock and [multiple] option prices as 
a single “training path”, since the network is trained on 
this sample. 

We assume that the underlying asset for our sim¬ 
ulation experiments is a “typical” NYSE stock, with 
an initial price S'(O) of $50.00, an annual continuously- 
compounded expected rate of return fi of 10%, and an 
annual volatility a of 20%. Under the Black-Scholes as¬ 
sumption of a geometric Brownian motion: 

dS(t) = fiS(t)dt + aS(t)dW(t) (8) 

and taking the number of days per year to be 253, 
we draw 506 pseudorandom variates Z t from the dis¬ 
tribution N(ji/ 253, <t 2 /253) to obtain two years of daily 
continuously-compounded returns, which are converted 

E t z 

i=i ’ for 

t > 0. 

Given a simulated training path {S'(t)} of daily stock 
prices, we construct a corresponding path of option 
prices according to the rules of the Chicago Board 
Options Exchange (CBOE) for introducing options on 
stocks. Since a thorough description of these rules is un¬ 
necessary for our purposes, we summarize only the most 
salient features here. 6 At any one time, CBOE stock op¬ 
tions outstanding on a particular stock have four unique 
expiration dates: the current month, the next month, 
and the following two expirations from a quarterly sched¬ 
ule. The CBOE sets strike prices at multiples of $5 for 
stock prices in the $25 to $200 range, which all of our 
simulated prices fall into. When options expire and a 
new expiration date is introduced, the two strike prices 
closest to the current stock price are used. If the current 
price is very close to one of those strike prices—within $1 
in our simulations—a third strike price is used to better 
bracket the current price. If the stock price moves out¬ 
side of the current strike-price range, another strike price 

6 See Hull (1993) for more details. 


is generally added for all expiration dates to bracket that 
price. 7 We assume that all of the options generated ac¬ 
cording to these rules are traded every day, although in 
practice, far-from-the-money and long-dated options are 
often very illiquid. 

A typical training path is shown in Figure 3. We can 
also plot the training path as a 3-dimensional surface if 
we normalize stock and option prices by the appropri¬ 
ate strike price and consider the option price as a func¬ 
tion of the form f(S/X, 1 ,T — t) [see Figure 4]. Because 
the options generated for a particular sample path are 
a function of the [random] stock price path, the size of 
this data matrix [in terms of number of options and total 
number of data points] varies across sample paths. For 
our training set, the number of options per sample path 
range from 71 to 91, with an average of 81. The total 
number of data points range from 5,227 to 6,847, with 
an average of 6,001. 

3.2 Training Network Pricing Formulas 

Now we are set to estimate or train pricing formulas 
of the form of f(S/X, 1 ,T — t) on the simulated train¬ 
ing paths, using two “inputs”: S(t)/X and T — t. For 
comparison, we first estimate two simple linear models 
estimated using ordinary least squares (OLS). The first 
model is linear regression of the option price on S(t)/X 
and T—t. The second is a pair of linear regressions, one 
for options currently in the money, and another for those 
currently out of the money. Typical estimates of these 
models are shown in Table 2. 

Although these linear models seem to fit quite well, 
with R 2 s well above 80%, they have particularly naive 
implications for delta-hedging strategies. In particular, 
delta-hedging with the first linear model would amount 
to purchasing a certain number of shares of stock in 
the beginning [0.6886 in the example in Table 2] and 
holding them until expiration, regardless of stock price 
movements during the option’s life. The second linear 
model improves on this slightly by switching between 
hedging with a large number [0.9415 in Table 2b] and a 
small number of shares [0.1882 in Table 2c] depending 
on whether the current stock price is less than or greater 
than the strike price. 

The nonlinear models obtained from learning net¬ 
works, on the other hand, yield estimates of option prices 
and deltas that are difficult to distinguish visually from 
the true Black-Scholes values. An example of the esti¬ 
mates and errors for an RBF network is shown in Fig¬ 
ure 5, which was estimated from the same data as the 
linear models from Table 2. The estimated equation for 
this particular RBF network is shown in Table 1. Ob¬ 
serve from Table 1 that the centers in the RBF model 
are not constrained to lie within the range of the inputs, 
and in fact do not in the third and fourth centers in 

7 In our simulations, this was not done for options with 
less than one week to expiration. 



our example. The largest errors in these networks tend 
to occur at the kink-point for options at the money at 
expiration, and also along the boundary of the sample 
points. 

PPR and MLP networks of similar complexity gener¬ 
ate similar response surfaces, although as we shall see in 
the next section, each method has its own area of the 
input space that it models slightly more accurately than 
the others. 

Our choice of model-complexity is not arbitrary, and 
in fact is motivated by our desire to minimize error and 
maximize “fit” for out-of-sample data. In this regard, a 
critical issue in specifying learning networks is how many 
nonlinear terms—“hidden units”, basis functions, pro¬ 
jections, etc.—to use in the approximation. Following 
the discussion in Section 2.2.2, for actual market data, 
we might expect an optimal number of parameters that 
minimizes out-of-sample error. But in the simulations 
of this section, the data are noise-free [in the sense that 
there is a deterministic formula generating the outputs 
from the inputs], hence we are interested primarily in 
how quickly adding more parameters reduces the error. 
Preliminary out-of-sample tests with independent sam¬ 
ple paths have indicated diminishing returns beyond 4 
nonlinear terms [as measured by the percent of variance 
explained], thus we adopt this specification for all the 
learning networks considered in this paper. 8 In the next 
sections we will assess how well we have done in meeting 
our goal of minimizing out-of-sample error. 

3.3 Performance Measures 

Our learning networks estimate the option prices C/X, 
thus our first performance measure is simply the usual 
coefficient of determination, if 2 , of those estimated val¬ 
ues compared with the true option prices C'/X, com¬ 
puted for the out-of-sample data. 

However, the R 2 measure is not ideal for telling us 
the practical value of any improvement in pricing accu¬ 
racy that the learning networks might give us. A more 
meaningful measure of performance for a given option 
pricing formula is the “tracking error” of various repli¬ 
cating portfolios designed to delta-hedge an option posi¬ 
tion, using the formula in question to calculate the hedge 
ratios or deltas. In particular, suppose at date 0 we sell 
one call option and undertake the usual dynamic trad¬ 
ing strategy in stocks and bonds to hedge this call during 
its life. If we have correctly identified the option pricing 
model, and if we can costlessly and continuously hedge, 
then at expiration the combined value of our stock and 
bond positions should exactly offset the value of the call. 
The difference between the terminal value of the call and 
the terminal combined value of the stock and bond po¬ 
sitions may then serve as a measure of the accuracy of 
our network approximation. Of course, since it is impos- 

8 4 nonlinear terms corresponds to approximately 20 total 
parameters. 


sible to hedge continuously in practice, there will always 
be some tracking error due to discreteness, therefore we 
shall compare the RBF tracking error with the tracking 
error of discrete delta-hedging under the exact Black- 
Scholes formula. 

More formally, denote by V(t) the dollar value of our 
replicating portfolio at date t and let 

V(t) = V s (t) + V B (t) + V c (t ) (9) 

where Vs(t) is the dollar value of stocks, V B (t) is the 
dollar value of bonds, and Vc(t) is the dollar value of 
call options held in the portfolio at date t. The initial 
composition of this portfolio at date 0 is assumed to be: 

Rs(0) = S(0 )Arbf(0) (10) 

M°) = - Tbs(0) (11) 

MO) = - (Vs(0) + M0)"l (12) 


where Fbs(-) is the Black-Scholes call option pricing for¬ 
mula, Frbf(') is its RBF approximation, and 


Arbf(I) 


<9Arbf(1) 

dS 


The portfolio positions (10) - (12) represent the sale of 
one call option at date 0, priced according to the theo¬ 
retical Black-Scholes formula Fbs(0), and the simultane¬ 
ous purchase of Arbf(0) shares of stock at price 5(0), 
where Arbf(0) is the derivative of the RBF approxima¬ 
tion Frbf(0) with respect to the stock price. 9 Since the 
stock purchase is wholly financed by the combination 
of riskless borrowing and proceeds from the sale of the 
call option, the initial value of the replicating portfolio 
is identically zero, thus 


R(0) = mo) + Mo) + Mo) = 0 • 

Prior to expiration, and at discrete and regular intervals 
of length r [which we take to be one day in our simu¬ 
lations], the stock and bond positions in the replicating 
portfolio will be rebalanced so as to satisfy the following 
relations: 

Vs(t) = S(t) Arbf(I), (13) 

V B (t) = e rT V B (t-r) - 

S(t) (^Arbf^) - Arbf(1 —f)^) (14) 


where t = kr < T for some integer k. The tracking 
error of the replicating portfolio is then defined to be 
the value of the replicating portfolio V ( T ) at expiration 

9 Note that for the RBF and MLP learning networks, A 
can be computed analytically by taking the derivative of the 
network approximation. For PPR, however, the use of a 
smoother for estimating the nonlinear functions h forces a 
numerical approximation of A, which we accomplish with 
a first-order finite-difference with an increment dS of size 
1/1000 of the range of S. 
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date T. From this, we obtain the following performance 
measure: 

€ = e- rT E[ \V(T )\] . (15) 

The quantity £ is simply the present value of the ex¬ 
pected absolute tracking error of the replicating portfo¬ 
lio. Although for more complex option portfolios, £ may 
not be the most relevant criterion, nevertheless £ does 
provide some information about the accuracy of our op¬ 
tion pricing formula. 10 

A third measure of performance may be defined by 
combining the information contained in the expected 
tracking error with the variance of the tracking error. 
In particular, we define the “prediction error” rj as: 

T) = e- rT VE 2 [b(T)] + Var [V(T]] (16) 

which is the present value of the square root of the sum 
of the squared expected tracking error and its variance. 
The inclusion of the variance of V ( T ) is significant—the 
expected tracking error of a delta-hedging strategy might 
be zero, but the strategy is a poor one if the variance 
of the tracking error were large. We shall use all three 
measures if 2 , £, and r/ in our performance analysis below. 

3.4 Testing Network Pricing Formulas 

To assess the quality of the RBF pricing formula ob¬ 
tained from each training path, we simulate an inde¬ 
pendent six-month sample of daily stock prices—a “test 
path”—and use the trained network to delta-hedge var¬ 
ious options [individually, not as a portfolio] introduced 
at the start of the test path. By simulating many inde¬ 
pendent test paths, 500 in our case, and averaging the 
absolute tracking errors over these paths, we can ob¬ 
tain estimates £ and f) of the expected absolute tracking 
error £ and the prediction error rj for each of the ten 
network pricing formulas. The performance of the net¬ 
work delta-hedging strategy may then be compared to 
the performance of a delta-hedging strategy using the 
Black-Scholes formula. 

3.4.1 Out-of-Sample R 2 Comparisons 

As a preliminary check of out-of-sample performance, 
we observe that the pricing errors of the direct model 
outputs C'/X are typically quite small for all of the net¬ 
works examined, with out-of-sample f? 2 ’s of 99% and 
above for the “average” network [except for the single 
linear model]. These results are presented in Table 3. 
From the minimum R 2 values, it is also evident that not 
all types of networks yield consistently good results, per¬ 
haps because of the stochastic nature of the respective 
estimation processes. 

10 In particular, other statistics of the sample path {F(t)| 
for the entire portfolio may be of more concern, such as its 
maximum and minimum, and the interaction between {F(t)| 
and other asset returns. 


3.4.2 Tracking Error Comparisons 

Table 4 reports selected raw simulation results for a 
call option with 3 months to expiration and a strike price 
X of $50. In each row, the absolute tracking errors for 
delta-hedging this option are reported for the network 
pricing formula training on a single training path, the 
entries in each column corresponding to a different test 
path for which the absolute tracking error is calculated. 
For example, the (1, 2)-entry 0.2719 is the absolute track¬ 
ing error for delta-hedging this 3-month $50-strike option 
over test path #100, using the network pricing formula 
trained on training path #1. 

For comparison, over the same test path the abso¬ 
lute tracking error for a delta-hedging strategy using the 
Black-Scholes formula is 0.3461, reported in the last row. 
The fact that the RBF network pricing formula can yield 
a smaller delta-hedging error than the Black-Scholes for¬ 
mula may seem counterintuitive. After all, the Black- 
Scholes formula is indeed the correct pricing formula in 
the context of our simulations. The source of this appar¬ 
ent paradox lies in the fact that we are delta-hedging dis¬ 
cretely [once a day], whereas the Black-Scholes formula 
is based on a continuously-adjusted delta-hedging strat¬ 
egy. Therefore, even the Black-Scholes formula will ex¬ 
hibit some tracking error when applied to Black-Scholes 
prices at discrete time intervals. In such cases, an RBF 
pricing formula may well be more accurate since it is 
trained directly on the discretely-sampled data, and not 
based on a continuous-time approximation. 

Of course, other columns in Table 4 show that Black- 
Scholes can perform significantly better than the RBF 
formula [for example, compare the (1, l)-entry of 0.6968 
with the Black-Scholes value of 0.0125]. Moreover, as the 
delta-hedging interval shrinks, the Black-Scholes formula 
will become increasingly more accurate and, in the limit, 
will have no tracking error whatsoever. However, since 
such a limit is empirically unattainable for a variety of 
institutional reasons, the benefits of network pricing for¬ 
mulas may be quite significant. 

For a more complete comparison between RBF net¬ 
works and the Black-Scholes formula across all 500 test 
paths, Table 5 reports the fraction of test paths for which 
each of the ten RBF networks exhibit lower absolute 
tracking error than the Black-Scholes formula. Similar 
comparisons are also performed for the single-regression 
model [“Linear-1”], the two-regression model [“Linear- 
2”], a projection pursuit regression [“PPR”] with four 
projections, and a multilayer perceptron [“MLP”] with 
one hidden layer containing four units. 

The third column of entries in Table 5 show that in 
approximately 36 percent of the 500 test paths, RBF net¬ 
works have lower tracking error than the Black-Scholes 
formula. For this particular option RBF networks and 
PPR networks have quite similar performance, and both 
are superior to the three other pricing models—the next 
closest competitor is the MLP, which outperforms the 



Black-Scholes formula for approximately 26 percent of 
the test paths. 

Of course, tracking errors tend to vary with the terms 
of the option such as its time-to-maturity and strike 
price. To gauge the accuracy of the RBF and other pric¬ 
ing models across these terms, we report in Tables 6-10 
the fraction of test paths for which each of the four pric¬ 
ing models outperforms Black-Scholes for strike prices 
X = 40, 45, 50, 55, and 60, and times-to-maturity 
T—t = 1, 3, and 6 months. 

Table 6 shows that the average RBF network— 
averaged over the ten training paths—performs reason¬ 
ably well for near-the-money options at all three maturi¬ 
ties, outperforming Black-Scholes between 12% and 36% 
of the time for options with strike prices between $45 
and $55. As the maturity increases, the performance of 
the average RBF network improves for deep-out-of-the 
money options as well, outperforming Black-Scholes for 
30% of the test paths for the call with a strike price of 
$60. 

Tables 7 and 8 provides similar comparisons for 
the average MLP and PPR networks, respectively— 
averaged over the same training paths as the RBF 
model—with similar results: good performance for near- 
the-money options at all maturities, and good perfor¬ 
mance for deep-out-of-the-money options at longer ma¬ 
turities. 

Not surprisingly, Tables 9 and 10 show that the linear 
models exhibit considerably weaker performance than ei¬ 
ther of the network models, with fractions of outperform¬ 
ing test paths between 0.0% and 10.3% for the single¬ 
regression model, and between 0.0% and 14.6% for the 
two-regression model. However, these results do offer one 
important insight: even simple linear models can some¬ 
times, albeit rarely, outperform the Black-Scholes model 
when delta-hedging is performed on a daily frequency. 

Finally it is important to note that network pricing 
formulas should be monitored carefully for extrapolation. 
Because the networks are trained on a sampling of points 
covering a specific region of input space, it should not be 
surprising that they may not perform as well on points 
outside of this region. For example, Figure 6 illustrates 
that the worst tracking error for RBF networks in our 
simulations occurred for test data that was well outside 
of the range of the training data. 

3.4.3 Prediction Error Comparisons 

To complete our performance analysis of the network¬ 
ing option pricing formulas, we compare the estimated 
prediction errors f) of the network delta-hedging strate¬ 
gies to those of the Black-Scholes formula. Recall from 
(16) that the prediction error combines the expectation 
and variance of the absolute tracking error, hence the 
estimated prediction error is calculated with the sam¬ 
ple mean and sample variance of |R(T)|, taken over the 
500 test paths. The benchmarks for comparison are the 


estimated prediction errors for the Black-Scholes delta¬ 
hedging strategy, given in Table 11. 

Once again, we see from Table 11 that delta-hedging 
with the Black-Scholes at discrete intervals does not yield 
a perfect hedge. The estimated prediction errors are 
all strictly positive, and are larger for options near the 
money and with longer times-to-maturity. 

However, under the prediction error performance 
measure the Black-Scholes formula is superior to all of 
the learning network approaches for this simulated data 
[see Tables 12 - 16]. For example, these tables show that 
the average RBF network has larger estimated prediction 
errors than Black-Scholes for all option types [although 
RBF networks have smaller errors than the other learn¬ 
ing network types] and that the linear models are signif¬ 
icantly worse than the others. 11 We also note that the 
pattern of errors is somewhat different for each learning 
network, indicating that each may have its own area of 
dominance. 

Overall, we are encouraged by the ease with which the 
learning networks achieved error levels similar to those 
of the Black-Scholes formula, and on a problem posed in 
the latter’s favor. We suspect that the learning network 
approach will be a promising alternative for pricing and 
hedging derivatives where there is uncertainty about the 
specification of the asset return process. 

4 An Application to S&P 500 Futures 
Options 

In Section 3 we have shown that learning networks can 
efficiently approximate the Black-Scholes pricing formula 
if the data were generated by it, and this provides some 
hope that our nonparametric approach may be useful in 
practice. After all, if there is some uncertainty about the 
parametric assumptions of a typical derivative pricing 
model, it should come as no surprise that a nonparamet¬ 
ric model can improve pricing and hedging performance. 
To gauge the practical relevance of learning networks in 
at least one context, we apply it to the pricing and hedg¬ 
ing of S&P 500 futures options, and compare it to the 
Black-Scholes model applied to the same data. Despite 
the fact that the Black-Scholes model is generally not 
used in its original form in practice, we focus on it here 
because it is still a widely-used benchmark model, and 
because it serves as an example of a parametric model 
whose assumptions are questionable in the context of 
this data. 

11 We caution the reader from drawing too strong a conclu¬ 
sion from the ordering of the RBF, MLP, and PPR results, 
however, due to the sensitivity of these nonparametric tech¬ 
niques to the “tuning” of their specifications, e.g., number 
of hidden nodes, network architecture, etc. In particular, 
the superiority of the RBF network results may be due to 
the fact that we have had more experience in tuning their 
specification. 



4.1 The Data and Experimental Setup 

The data for our empirical analysis are daily closing 
prices of S&P 500 futures and futures options for the 
5-year period from January 1987 to December 1991. Fu¬ 
tures prices over this period are shown in Figure 7. There 
were 24 different futures contracts and 998 futures call 
options active during this period. 12 The futures con¬ 
tracts have quarterly expirations, and on a typical day 
40 to 50 call options based on 4 different futures con¬ 
tracts were traded. 

Our specification is similar to that given in Section 3.1 
for the simulated data. We divide the S&P 500 data 
into 10 non-overlapping six-month subperiods for train¬ 
ing and testing the learning networks. Six-month subpe¬ 
riods were chosen to match approximately the number of 
data points in each training path with those of our sim¬ 
ulations in Section 3. Data for the second half of 1989 is 
shown in Figures 8 and 9. Notable differences between 
this data and the simulated data of Section 3 are the 
presence of “noise” in the real data and the irregular 
trading activity of the options, especially for near-term 
out-of-the-money options. 

For the S&P 500 data, the number of futures call 
options per subperiod ranged from 70 to 179, with an 
average of 137. The total number of data points per 
subperiod ranged from 4,454 to 8,301, with an average 
of 6,246. To limit the effects of nonstationarities and 
to avoid data-snooping, we trained a separate learning 
network on each of the first 9 subperiods, and tested 
those networks only on the data from the immediately 
following subperiod, thus yielding 9 test paths for each 
network. We also considered the last 7 test paths sep¬ 
arately, i.e., data from July 1988 to December 1991, to 
assess the influence of the October 1987 crash on our 
results. 

4.2 Estimating Black-Scholes Prices 

Estimating and comparing models on the S&P 500 data 
will proceed much as it did in Section 3 for the linear and 
learning network models. However, the Black-Scholes 
parameters r and a must be estimated when using actual 
market data. From a theoretical perspective, the Black- 
Scholes model assumes that both of these parameters are 
constant over time, and thus we might be tempted to 
estimate them using all available past data. Few practi¬ 
tioners adopt this approach, however, due to substantial 
empirical evidence of nonstationarities in interest rates 
and asset-return distributions. A common compromise 
is to estimate the parameters using only a window of 
the most recent data. We follow this latter approach for 
the S&P 500 data. Specifically, we estimate the Black- 
Scholes volatility a for a given S&P 500 futures contract 


12 For simplicity, we focus only on call options in our 
analysis. 


using 

a = s/V 60 (17) 

where s is the standard deviation of the 60 most recent 
continuously-compounded daily returns of the contract. 
We approximate the risk free rate r for each futures op¬ 
tion as the yield of the 3-month Treasury bill on the close 
of the month before the initial activity in that option [see 
Figure 10]. 

4.3 Out-of-Sample Pricing and Hedging 

In this section we present the out-of-sample results of fit¬ 
ting the various models to the S&P 500 data. Based on 
our experience with the simulated data, we chose learn¬ 
ing networks with 4 nonlinear terms as a good compro¬ 
mise between accuracy and complexity, although it may 
be worth re-examining this trade-off on actual S&P 500 
data. 13 

The out-of-sample tests show some evidence that the 
learning networks outperform the naive Black-Scholes 
model on this data. This is hardly surprising, given the 
fact that many of the assumptions of the Black-Scholes 
formula are violated by the data, e.g., geometric Brown¬ 
ian motion, constant volatility, frictionless markets, etc. 

As with the simulated-data-trained learning networks, 
the performance of each of actual-data-trained networks 
varied over the input space. To see how the performance 
varies in particular, we divide each dimension of the in¬ 
put space into three regimes: long-, medium-, and short¬ 
term for the time-to-expiration (T — t ) input, and in-, 
near-, and out-of-the-money for the stock-price/strike- 
price (S/X) input. Specifically, breakpoints of 2 and 5 
months for the T—t input and 0.97 and 1.03 for the S/X 
input were chosen to yield approximately the same num¬ 
ber of datapoints in each of the 9 paired categories. The 
delta-hedging prediction errors, broken down by these 
maturity/richness groups, are shown in Tables 17 and 18. 
Interestingly, results from the subperiods influenced by 
the October 1987 crash still yield lower prediction er¬ 
rors for the learning networks than for the Black-Scholes 
model, except for near-term in-the-money options. 

For completeness we also show the out-of-sample Ft 2 ’s 
[see Table 19] and the absolute hedging error compari¬ 
son [see Table 20] as we did in Section 3.4 for the syn¬ 
thetic data. Table 19, for instance, shows that the aver¬ 
age out-of-sample R 2 of roughly 85% for the estimated 
Black-Scholes model is somewhat worse than that of the 
other network models. Note however that unlike the 
case for our synthetic data, the options in the S&P 500 
data set are not independent, and thus we must look at 
these results with caution. Furthermore, we only have 
one test set for each trained network, and thus for the 
hedging error comparison in Table 20 we show these re¬ 
sults broken down by test period instead of the summary 

13 A sample re-use technique such as cross-validation would 
be appropriate in this context for choosing the number of 
nonlinear terms. 
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statistics shown in Section 3.4.2. Nonetheless, this table 
shows that the learning networks exhibit less hedging 
error than the estimated Black-Scholes formula in a sub¬ 
stantial fraction of the options tested—up to 65% of the 
options tested against the MLP network for the July - 
December 1990 testing period. 

From these results, it is difficult to infer which net¬ 
work type performs best in general. Hypothesis tests 
concerning the relative sizes of hedging error are diffi¬ 
cult to formulate precisely because of the statistical de¬ 
pendence of the option-price paths. Focusing on a sin¬ 
gle non-overlapping sequence of options would solve the 
dependence problem, but would throw out 98% of the 
available options. Instead, we present a less formal test 
on all of the data, but caution the reader not to give 
it undue weight. Since we have hedging errors for each 
option and learning network, we can use a paired t-test 
to compare the Black-Scholes absolute hedging error on 
each option with the network’s absolute hedging error 
on the same option. The null hypothesis is that the av¬ 
erage difference of the two hedging errors is zero, and 
the [one-sided] alternative hypothesis is that the differ¬ 
ence is positive, i.e., the learning network hedging error 
is smaller. Results of this simple test show evidence that 
all three learning networks outperform the Black-Scholes 
model, while the linear models do not [see Table 21]. 

It is also interesting to compare the computing time 
required to estimate these models, although no effort 
was made to optimize our code, nor did we attempt to 
optimize the estimation method for each type of learn¬ 
ing network. With these qualifications in mind, we find 
that second order methods seem preferred for our ap¬ 
plication. For example, the MLP network gradient de¬ 
scent equations were updated for 10,000 iterations, re¬ 
quiring roughly 300 minutes per network on a multiuser 
SUN SPARCstation II, while the Levenberg-Marquardt 
method for the RBF networks used from 10 to 80 itera¬ 
tions and took roughly 7 minutes per network. Similarly, 
the PPR networks [with a Newton method at the core] 
took roughly 120 minutes per network. 

5 Conclusions 

Although parametric derivative pricing formulas are pre¬ 
ferred when they are available, our results show that 
nonparametric learning-network alternatives can be use¬ 
ful substitutes when parametric methods fail. While our 
findings are promising, we cannot yet claim that our 
approach will be successful in general—for simplicity, 
our simulations have focused only on the Black-Scholes 
model, and our application has focused only on a single 
instrument and time period, S&P 500 futures options for 
1987 to 1991. In particular, there are a host of paramet¬ 
ric derivative pricing models, as well as many practical 
extensions of these models that may improve their per¬ 
formance on any particular data set. We hope to provide 


a more comprehensive analysis of these alternatives in 
the near future. 

However, we do believe there is reason to be cautiously 
optimistic about our general approach, with a number 
of promising directions for future research. Perhaps the 
most pressing item on this agenda is the specification of 
additional inputs, inputs that are not readily captured 
by parametric models such as the return on the market, 
general market volatility, and other measures of busi¬ 
ness conditions. A related issue is the incorporation of 
the predictability of the underlying asset’s return, and 
cross-predictability among several correlated assets [see 
Lo and Wang (1993) for a parametric example]. This 
may involve the construction of a factor model of the 
underlying asset’s return and volatility processes. 

Other research directions are motivated by the need 
for proper statistical inference in the specification of 
learning networks. First, we require some method of 
matching the network architecture—number of nonlin¬ 
ear units, number of centers, type of basis functions, 
etc.—to the specific dataset at hand in some optimal 
[and, preferably, automatic] fashion. 

Second, the relation between sample size and approx¬ 
imation error should be explored, either analytically 
or through additional Monte Carlo simulation experi¬ 
ments. Perhaps some data-dependent metric can be con¬ 
structed, such as the model prediction error, that can 
provide real-time estimates of approximation errors in 
much the same way that standard errors may be ob¬ 
tained for typical statistical estimators. 

And finally, the need for better performance measures 
is clear. While typical measures of goodness-of-Ht such 
as R 2 do offer some guidance for model selection, they 
are only incomplete measures of performance. Moreover, 
the notion of degrees of freedom is no longer well-defined 
for nonlinear models, and this has implications for all 
statistical measures of fit. 
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cjx = - O.oey 

S/X - 1.35 ' 
T-t - 0.45 

/ 

59.79 -0.03 ' 
-0.03 10.24 

S/X - 1.35 ' 
T-t - 0.45 

+ 2.55 

- 0.03y^ 

S/X - 1.18 ' 
T-t - 0.24 

/ 

59.79 -0.03 ' 
-0.03 10.24 

S/X - 1.18 ' 
T-t - 0.24 

+ 1.97 

+ 0.03^ 

S/X - 0.98 ' 
T — t + 0.20 

/ 

59.79 -0.03 ' 
-0.03 10.24 

S/X - 0.98 ' 
T-t+ 0.20 

+ 0.00 

+ 0.10 J 

S/X - 1.05 ' 
T-t+ 0.10 

/ 

59.79 -0.03 ' 
-0.03 10.24 

S/X - 1.05 ' 
T-t+ 0.10 

+ 1.62 


+ 0.14 S/X - 0.24(T-t) - 0.01 . 

Table 1: Example estimated RBF equation from Section 3.2. 


Residual Standard Error = 0.027, R 2 = 0.9098, N = 6782 
F'z, 6779 -statistic = 34184.97, p -value = 0 



coef 

std.err 

t-stat 

p -value 

Intercept 

-0.6417 

0.0028 

-231.4133 

0 

S/X 

0.6886 

0.0027 

259.4616 

0 

T-t 

0.0688 

0.0018 

38.5834 

0 


(a) Single linear model. 


Residual Standard Error = 0.0062, R 2 = 0.9955, N = 3489 
F'z, 3486 -statistic = 385583.4, p -value = 0 



coef 

std.err 

t-stat 

p -value 

Intercept 

-0.9333 

0.0012 

-763.6280 

0 

S/X 

0.9415 

0.0011 

875.0123 

0 

T-t 

0.0858 

0.0006 

150.6208 

0 


(b) “In-the-money 

” linear model. 

Residual Standard Error = 0.007, R 2 = 
T 2 l 329 o-statistic = 9753.782, p -value = 0 

0.8557, IV 


coef 

std.err 

t-stat 

p -value 

Intercept 

-0.1733 

0.0022 

-80.3638 

0 

S/X 

0.1882 

0.0023 

80.6965 

0 

T-t 

0.0728 

0.0007 

108.2335 

0 


(c) “Out-of-the-money” linear model. 


Table 2: Regression summaries for typical linear models. 
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Linear-1 

Linear-2 

RBF 

PPR 

MLP 

B-S 

Min 

14.72 

94.34 

98.58 

55.23 

76.60 

100.00 

Mean 

83.40 

99.27 

99.95 

99.08 

99.48 

100.00 

Max 

95.57 

99.82 

99.99 

100.00 

99.96 

100.00 


Table 3: Out-of-sample R 2 values [in percent] for the learning networks, summarized across all training and out-of- 
sample test sets. “Linear-1” refers to the single-regression model of the data; “Linear-2” refers to the two-regression 
model, one for in-the-money options and one for out-of-the-money options; “RBF” refers to a radial-basis-function 
network with 4 multiquadric centers and an output sigmoid; “PPR” refers to a projection pursuit regression with 
four projections; and “MLP” refers to a multilayer perceptron with a single hidden layer containing four units. 


Train #1 
Train #2 
Train #3 
Train #4 
Train #5 
Train #6 
Train #7 
Train #8 
Train #9 
Train #10 
EPS 


Test #100 
0.6968 
0.6536 
0.6832 
0.7175 
0.6938 
0.6755 
0.6971 
0.7075 
0.6571 
0.7105 
0.0125 


Test #200 
0.2719 
0.2667 
0.2622 
0.2682 
0.2767 
0.2692 
0.2690 
0.2717 
0.2652 
0.2706 
0.3461 


Test #300 
0.1154 
0.0882 
0.0698 
0.0955 
0.1055 
0.1085 
0.1104 
0.1087 
0.1016 
0.1135 
0.0059 


Test #400 
0.0018 
0.0903 
0.0370 
0.0155 
0.0229 
0.0083 
0.0054 
0.0022 
0.0013 
0.0038 
0.0677 


Test #500 
0.5870 
0.5523 
0.5534 
0.5918 
0.5993 
0.5600 
0.5809 
0.5859 
0.5389 
0.5913 
0.0492 


Table 4: Simulations of absolute delta-hedging errors for RBF networks for an at-the-money call option with X = 50, 
T—t = 3 months, and Black-Scholes price $2.2867. The current stock price S'(O) is assumed to be $50. The last row 
displays the same errors for the Black-Scholes formula. 



Linear-1 

Linear-2 

RBF 

PPR 

MLP 

Train #1 

0.062 

0.102 

0.354 

0.362 

0.260 


(0.011) 

(0.014) 

(0.021) 

(0.021) 

(0.020) 

Train #2 

0.048 

0.112 

0.340 

0.390 

0.264 


(0.010) 

(0.014) 

(0.021) 

(0.022) 

(0.020) 

Train #3 

0.088 

0.108 

0.380 

0.350 

0.268 


(0.013) 

(0.014) 

(0.022) 

(0.021) 

(0.020) 

Train #4 

0.084 

0.098 

0.370 

0.340 

0.254 


(0.012) 

(0.013) 

(0.022) 

(0.021) 

(0.019) 

Train #5 

0.062 

0.100 

0.358 

0.360 

0.278 


(0.011) 

(0.013) 

(0.021) 

(0.021) 

(0.020) 

Train #6 

0.056 

0.108 

0.364 

0.378 

0.274 


(0.010) 

(0.014) 

(0.022) 

(0.022) 

(0.020) 

Train #7 

0.084 

0.102 

0.368 

0.362 

0.272 


(0.012) 

(0.014) 

(0.022) 

(0.021) 

(0.020) 

Train #8 

0.080 

0.104 

0.358 

0.328 

0.262 


(0.012) 

(0.014) 

(0.021) 

(0.021) 

(0.020) 

Train #9 

0.066 

0.104 

0.368 

0.374 

0.272 


(0.011) 

(0.014) 

(0.022) 

(0.022) 

(0.020) 

Train #10 

0.080 

0.104 

0.354 

0.382 

0.280 


(0.012) 

(0.014) 

(0.021) 

(0.022) 

(0.020) 


Table 5: Fraction of 500 test sets in which the absolute delta-hedging error was lower than Black-Scholes for an 
at-the-money call option with X = 50, T—t = 3 months, and Black-Scholes price $2.2867 [standard errors are given 
in parentheses]. The current stock price S'(O) is assumed to be $50. 
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RBF 

X = 40 

L0 

II 

o 

II 

X = 55 

O 

*0 

II 

T-t = 

1 

Mean 

0.001 

0.120 

0.278 

0.266 

0.032 



(SE) 

(0.000) 

(0.005) 

(0.006) 

(0.006) 

(0.003) 



Min 

0.000 

0.108 

0.270 

0.176 

0.022 



Max 

0.002 

0.140 

0.284 

0.332 

0.040 

T-t = 

3 

Mean 

0.072 

0.296 

0.361 

0.269 

0.254 



(SE) 

(0.004) 

(0.006) 

(0.007) 

(0.006) 

(0.006) 



Min 

0.054 

0.242 

0.340 

0.248 

0.170 



Max 

0.084 

0.336 

0.380 

0.322 

0.336 

T-t = 

6 

Mean 

0.164 

0.263 

0.316 

0.243 

0.304 



(SE) 

(0.005) 

(0.006) 

(0.007) 

(0.006) 

(0.007) 



Min 

0.120 

0.220 

0.298 

0.234 

0.276 



Max 

0.200 

0.310 

0.324 

0.258 

0.320 


Table 6: Fraction of 500 test sets in which the absolute delta-hedging error using an RBF network with 4 multiquadric 
centers and an output sigmoid is lower than the Black-Scholes delta-hedging error, for call options with strike price 
X and time-to-maturity T—t months on a non-dividend-paying stock currently priced at $50. Within each panel, the 
top entry of each column is the average of this fraction across the 10 training paths, the second entry [in parentheses] 
is the standard error of that average, and the third and fourth entries are the minimum and maximum across the 10 
training paths. 


MLP 

X = 40 

L 0 

II 

O 

II 

X = 55 

O 

*0 

II 

T-t = 1 

Mean 

0.000 

0.046 

0.238 

0.125 

0.019 


(SE) 

(0.000) 

(0.003) 

(0.006) 

(0.005) 

(0.002) 


Min 

0.000 

0.034 

0.228 

0.110 

0.008 


Max 

0.000 

0.066 

0.246 

0.132 

0.028 

T-t = 3 

Mean 

0.022 

0.174 

0.268 

0.354 

0.280 


(SE) 

(0.002) 

(0.005) 

(0.006) 

(0.007) 

(0.006) 


Min 

0.004 

0.130 

0.254 

0.324 

0.216 


Max 

0.040 

0.220 

0.280 

0.386 

0.384 

T-t = 6 

Mean 

0.030 

0.187 

0.252 

0.330 

0.253 


(SE) 

(0.002) 

(0.006) 

(0.006) 

(0.007) 

(0.006) 


Min 

0.004 

0.152 

0.204 

0.298 

0.216 


Max 

0.074 

0.212 

0.302 

0.354 

0.274 


Table 7: Fraction of 500 test sets in which the absolute delta-hedging error using an MLP network with a single 
hidden layer containing four units is lower than the Black-Scholes delta-hedging error, for call options with strike 
price X and time-to-maturity T—t months on a non-dividend-paying stock currently priced at $50. See Table 6 for 
details. 
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PPR 

X = 40 

L0 

II 

O 

II 

X = 55 

O 

*0 

II 

T-t = 

1 

Mean 

0.000 

0.165 

0.316 

0.303 

0.024 



(SE) 

(0.000) 

(0.005) 

(0.007) 

(0.006) 

(0.002) 



Min 

0.000 

0.118 

0.272 

0.208 

0.006 



Max 

0.002 

0.198 

0.394 

0.364 

0.052 

T-t = 

3 

Mean 

0.060 

0.282 

0.363 

0.325 

0.177 



(SE) 

(0.003) 

(0.006) 

(0.007) 

(0.007) 

(0.005) 



Min 

0.006 

0.202 

0.328 

0.244 

0.076 



Max 

0.126 

0.344 

0.390 

0.420 

0.286 

T-t = 

6 

Mean 

0.125 

0.287 

0.315 

0.293 

0.197 



(SE) 

(0.005) 

(0.006) 

(0.007) 

(0.006) 

(0.006) 



Min 

0.020 

0.190 

0.290 

0.234 

0.116 



Max 

0.202 

0.346 

0.352 

0.358 

0.286 


Table 8: Fraction of 500 test sets in which the absolute delta-hedging error using a PPR network with four projections 
is lower than the Black-Scholes delta-hedging error, for call options with strike price X and time-to-maturity T — t 
months on a non-dividend-paying stock currently priced at $50. See Table 6 for details. 


Linear-1 

X = 40 

L 0 

II 

O 

II 

X = 55 

O 

*0 

II 

T-t 

= 1 

Mean 

0.000 

0.020 

0.103 

0.016 

0.002 



(SE) 

Min 

(0.000) 

0.000 

(0.002) 

0.012 

(0.004) 

0.068 

(0.002) 

0.010 

(0.001) 

0.002 



Max 

0.000 

0.032 

0.124 

0.026 

0.002 

T-t 

= 3 

Mean 

0.003 

0.029 

0.071 

0.018 

0.007 



(SE) 

Min 

(0.001) 

0.000 

(0.002) 

0.016 

(0.004) 

0.048 

(0.002) 

0.010 

(0.001) 

0.006 



Max 

0.010 

0.060 

0.088 

0.032 

0.012 

T-t 

= 6 

Mean 

0.012 

0.035 

0.039 

0.037 

0.019 



(SE) 

Min 

(0.002) 

0.010 

(0.003) 

0.026 

(0.003) 

0.024 

(0.003) 

0.034 

(0.002) 

0.010 



Max 

0.016 

0.046 

0.050 

0.042 

0.026 


Table 9: Fraction of 500 test sets in which the absolute delta-hedging error using a single-regression model is lower 
than the Black-Scholes delta-hedging error, for call options with strike price X and time-to-maturity T — t months 
on a non-dividend-paying stock currently priced at $50. See Table 6 for details. 


Linear-2 

X = 40 

L 0 

II 

O 

II 

X = 55 

O 

*0 

II 

T-t 

= 1 

Mean 

(SE) 

Min 

Max 

0.000 

(0.000) 

0.000 

0.000 

0.080 

(0.004) 

0.060 

0.090 

0.146 

(0.005) 

0.128 

0.170 

0.068 

(0.004) 

0.058 

0.092 

0.004 

(0.001) 

0.004 

0.004 

T-t 

= 3 

Mean 

0.018 

0.107 

0.104 

0.095 

0.033 



(SE) 

(0.002) 

(0.004) 

(0.004) 

(0.004) 

(0.003) 



Min 

0.010 

0.088 

0.098 

0.080 

0.020 



Max 

0.024 

0.116 

0.112 

0.112 

0.052 

T-t 

= 6 

Mean 

0.045 

0.082 

0.072 

0.082 

0.059 



(SE) 

(0.003) 

(0.004) 

(0.004) 

(0.004) 

(0.003) 



Min 

0.032 

0.074 

0.056 

0.068 

0.038 



Max 

0.054 

0.090 

0.080 

0.096 

0.072 


Table 10: Fraction of 500 test sets in which the absolute delta-hedging error using a two-regression model is lower 
than the Black-Scholes delta-hedging error, for call options with strike price X and time-to-maturity T — t months 
on a non-dividend-paying stock currently priced at $50. See Table 6 for details. 
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B-S 

X = 40 

L0 

II 

O 

II 

X = 55 

O 

*0 

II 

T-t 

= 1 

0.001 

0.069 

0.217 

0.116 

0.007 

T-t 

= 3 

0.043 

0.146 

0.213 

0.155 

0.098 

T-t 

= 6 

0.088 

0.157 

0.208 

0.211 

0.147 


Table 11: Estimated prediction errors for the absolute tracking error of a delta-hedging strategy using the Black- 
Scholes formula, for call options with strike price X and time-to-maturity T — t months on a non-dividend-paying 
stock currently priced at $50, estimated across 500 independent test paths. Since the Black-Scholes parameters are 
assumed to be known, not estimated, these errors do not vary across training paths. 


RBF 

X = 40 

L0 

II 

o 

II 

X = 55 

O 

*0 

II 

T-t = 

1 

Mean 

0.044 

0.164 

0.310 

0.157 

0.039 



(SE) 

(0.003) 

(0.002) 

(0.002) 

(0.001) 

(0.001) 



Min 

0.031 

0.150 

0.298 

0.152 

0.035 



Max 

0.059 

0.172 

0.316 

0.163 

0.045 

T-t = 

3 

Mean 

0.142 

0.215 

0.296 

0.257 

0.155 



(SE) 

(0.008) 

(0.002) 

(0.001) 

(0.001) 

(0.001) 



Min 

0.113 

0.208 

0.291 

0.249 

0.152 



Max 

0.177 

0.225 

0.299 

0.263 

0.161 

T-t = 

6 

Mean 

0.286 

0.271 

0.309 

0.340 

0.214 



(SE) 

(0.011) 

(0.006) 

(0.002) 

(0.002) 

(0.001) 



Min 

0.236 

0.243 

0.299 

0.329 

0.207 



Max 

0.334 

0.300 

0.315 

0.347 

0.224 


Table 12: Estimated prediction errors for the absolute tracking error of a delta-hedging strategy using an RBF network 
with 4 multiquadric centers and an output sigmoid, for call options with strike price X and time-to-maturity T — t 
months on a non-dividend-paying stock currently priced at $50, estimated across 500 independent test paths. Within 
each panel, the top entry of each column is the average of the estimated prediction error across the 10 training paths, 
the second entry [in parentheses] is the standard error of that average, and the third and fourth entries are the 
minimum and maximum across the 10 training paths. 


MLP 

X = 40 

L0 

II 

O 

II 

X = 55 

O 

*0 

II 

T-t = 1 

Mean 

(SE) 

Min 

Max 

0.214 

(0.024) 

0.124 

0.386 

0.264 

(0.008) 

0.228 

0.314 

0.389 

(0.006) 

0.365 

0.429 

0.209 

(0.004) 

0.194 

0.234 

0.060 

(0.002) 

0.050 

0.075 

T-t = 3 

Mean 

0.690 

0.323 

0.366 

0.285 

0.178 


(SE) 

(0.118) 

(0.016) 

(0.003) 

(0.004) 

(0.002) 


Min 

0.271 

0.261 

0.356 

0.270 

0.171 


Max 

1.477 

0.417 

0.388 

0.308 

0.194 

T-t = 6 

Mean 

1.187 

0.733 

0.400 

0.356 

0.264 


(SE) 

(0.174) 

(0.087) 

(0.007) 

(0.004) 

(0.002) 


Min 

0.538 

0.425 

0.373 

0.344 

0.255 


Max 

2.377 

1.352 

0.448 

0.377 

0.274 


Table 13: Estimated prediction errors for the absolute tracking error of a delta-hedging strategy using an MLP 
network with a single hidden layer containing four units, for call options with strike price X and time-to-maturity 
T — t months on a non-dividend-paying stock currently priced at $50, estimated across 500 independent test paths. 
See Table 12 for further details. 
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PPR 

X = 40 

L0 

II 

O 

II 

X = 55 

O 

*0 

II 

T-t = 

1 

Mean 

0.198 

0.121 

0.271 

0.147 

0.081 



(SE) 

(0.094) 

(0.005) 

(0.006) 

(0.004) 

(0.024) 



Min 

0.028 

0.101 

0.245 

0.131 

0.028 



Max 

0.991 

0.144 

0.301 

0.167 

0.261 

T-t = 

3 

Mean 

1.180 

0.275 

0.276 

0.238 

0.247 



(SE) 

(0.299) 

(0.056) 

(0.006) 

(0.011) 

(0.046) 



Min 

0.134 

0.174 

0.254 

0.202 

0.136 



Max 

3.113 

0.759 

0.309 

0.320 

0.555 

T-t = 

6 

Mean 

2.140 

1.056 

0.383 

0.367 

0.443 



(SE) 

(0.383) 

(0.201) 

(0.045) 

(0.029) 

(0.074) 



Min 

0.511 

0.246 

0.259 

0.268 

0.224 



Max 

4.337 

2.325 

0.719 

0.589 

0.931 


Table 14: Estimated prediction errors for the absolute tracking error of a delta-hedging strategy using a PPR network 
with four projections, for call options with strike price X and time-to-maturity T—t months on a non-dividend-paying 
stock currently priced at $50, estimated across 500 independent test paths. See Table 12 for further details. 


Linear-1 

X = 40 

L0 

II 

O 

II 

X = 55 

O 

*0 

II 

T-t 

= 1 

Mean 

1.047 

0.967 

0.911 

1.672 

1.879 



(SE) 

Min 

(0.096) 

0.561 

(0.091) 

0.507 

(0.036) 

0.813 

(0.091) 

1.251 

(0.098) 

1.425 



Max 

1.492 

1.393 

1.132 

2.135 

2.375 

T-t 

= 3 

Mean 

1.849 

1.486 

1.697 

2.624 

3.015 



(SE) 

Min 

(0.172) 

0.983 

(0.117) 

0.959 

(0.049) 

1.580 

(0.153) 

1.936 

(0.163) 

2.260 



Max 

2.649 

2.091 

2.013 

3.411 

3.845 

T-t 

= 6 

Mean 

2.276 

2.124 

2.170 

2.910 

3.780 



(SE) 

Min 

(0.213) 

1.208 

(0.149) 

1.495 

(0.073) 

2.000 

(0.173) 

2.170 

(0.214) 

2.805 



Max 

3.275 

2.926 

2.629 

3.821 

4.879 


Table 15: Estimated prediction errors for the absolute tracking error of a delta-hedging strategy using a single¬ 
regression model, for call options with strike price X and time-to-maturity T — t months on a non-dividend-paying 
stock currently priced at $50, estimated across 500 independent test paths. See Table 12 for further details. 


Linear-2 

X = 40 

L0 

II 

O 

II 

X = 55 

O 

*0 

II 

T-t 

= 1 

Mean 

(SE) 

Min 

Max 

0.212 

(0.018) 

0.154 

0.340 

0.207 

(0.013) 

0.168 

0.304 

0.724 

(0.011) 

0.681 

0.776 

0.455 

(0.034) 

0.335 

0.628 

0.518 

(0.045) 

0.344 

0.739 

T-t 

= 3 

Mean 

0.371 

0.555 

1.054 

0.836 

0.790 



(SE) 

(0.029) 

(0.003) 

(0.013) 

(0.024) 

(0.067) 



Min 

0.277 

0.539 

0.995 

0.767 

0.539 



Max 

0.586 

0.566 

1.118 

0.972 

1.130 

T-t 

= 6 

Mean 

0.500 

0.955 

1.544 

1.454 

1.042 



(SE) 

(0.027) 

(0.008) 

(0.022) 

(0.019) 

(0.055) 



Min 

0.412 

0.909 

1.452 

1.373 

0.880 



Max 

0.709 

0.988 

1.650 

1.563 

1.342 


Table 16: Estimated prediction errors for the absolute tracking error of a delta-hedging strategy using a two-regression 
model, for call options with strike price X and time-to-maturity T—t months on a non-dividend-paying stock currently 
priced at $50, estimated across 500 independent test paths. See Table 12 for further details. 
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Short term 

Linear-1 

Linear-2 

RBF 

PPR 

MLP 

B-S 

0 (0) 

In the money 

6.70 

4.92 

5.04 

4.52 

4.94 

4.42 

24.26 

Near the money 

8.70 

4.12 

3.49 

3.37 

3.42 

2.76 

8.04 

Out of the money 

8.38 

2.71 

2.17 

2.31 

1.63 

1.59 

1.00 


Medium term 

Linear-1 

Linear-2 

RBF 

PPR 

MLP 

B-S 

0 (0) 

In the money 

9.48 

6.41 

6.70 

6.53 

5.62 

5.93 

35.88 

Near the money 

8.82 

6.93 

4.18 

5.02 

4.54 

5.31 

10.62 

Out of the money 

11.27 

4.69 

2.53 

2.73 

2.32 

2.55 

2.74 


Long term 

Linear-1 

Linear-2 

RBF 

PPR 

MLP 

B-S 

0 (0) 

In the money 

8.23 

6.14 

7.24 

11.40 

5.60 

7.58 

39.27 

Near the money 

8.55 

8.58 

6.37 

5.55 

5.17 

6.18 

16.14 

Out of the money 

12.13 

7.35 

3.54 

5.39 

4.36 

5.02 

6.86 


Table 17: Delta-hedging prediction error for the out-of-sample S&P 500 data from July 1988 to December 1991, i.e., 
excluding the subperiods directly influenced by the October 1987 crash, averaged across all training/test sets. 


Short term 

Linear-1 

Linear-2 

RBF 

PPR 

MLP 

B-S 

0 (0) 

In the money 

10.61 

8.80 

7.27 

9.23 

9.12 

3.94 

20.18 

Near the money 

16.30 

12.73 

7.77 

7.48 

8.08 

9.09 

10.76 

Out of the money 

23.76 

8.48 

7.43 

5.51 

5.34 

10.53 

5.44 


Medium term 

Linear-1 

Linear-2 

RBF 

PPR 

MLP 

B-S 

0 (0) 

In the money 

9.18 

11.17 

7.13 

12.57 

13.90 

16.00 

36.05 

Near the money 

24.48 

13.36 

7.59 

5.65 

5.11 

6.12 

12.98 

Out of the money 

34.31 

14.80 

12.30 

9.44 

9.64 

13.46 

7.45 


Long term 

Linear-1 

Linear-2 

RBF 

PPR 

MLP 

B-S 

0 (0) 

In the money 

24.97 

22.37 

13.84 

23.75 

27.13 

30.36 

28.08 

Near the money 

35.06 

12.93 

10.78 

10.11 

12.27 

16.03 

16.98 

Out of the money 

29.07 

14.05 

9.50 

8.59 

8.10 

10.86 

10.26 


Table 18: Delta-hedging prediction error for the out-of-sample S&P 500 data from July 1987 to July 1988, i.e., the 
subperiods directly influenced by the October 1987 crash, averaged across all training/test sets. 



Linear-1 

Linear-2 

RBF 

PPR 

MLP 

B-S 

Min 

7.85 

82.63 

81.33 

92.26 

92.28 

37.41 

Mean 

75.57 

95.54 

93.26 

96.56 

95.53 

84.76 

Max 

95.74 

99.44 

98.41 

99.54 

98.98 

99.22 


Table 19: Out-of-sample R 2 values [in percent] for the learning networks, summarized across the 9 out-of-sample 
S&P 500 futures options test sets. 


20 




Linear-1 

Linear-2 

RBF 

PPR 

MLP 

Jul 87 - Dec 87 

0.160 

0.377 

0.506 

0.593 

0.580 

Jan 88 - Jun 88 

0.189 

0.357 

0.476 

0.497 

0.538 

Jul 88 - Dec 88 

0.122 

0.341 

0.382 

0.358 

0.301 

Jan 89 - Jun 89 

0.221 

0.405 

0.534 

0.550 

0.481 

Jul 89 - Dec 89 

0.355 

0.428 

0.529 

0.609 

0.543 

Jan 90 - Jun 90 

0.329 

0.423 

0.557 

0.550 

0.631 

Jul 90 - Dec 90 

0.230 

0.425 

0.540 

0.569 

0.649 

Jan 91- Jun 91 

0.296 

0.419 

0.497 

0.346 

0.313 

Jul 91 - Dec 91 

0.248 

0.337 

0.218 

0.327 

0.317 


Table 20: Fraction of out-of-sample test set S&P 500 futures options in which the absolute delta-hedging error for 
each learning network was lower than the Black-Scholes delta-hedging error, shown for each test period. 


Pair 

t-statistic 

p-value 

Linear-1 vs B-S 

-15.1265 

1.0000 

Linear-2 vs B-S 

-5.7662 

1.0000 

RBF vs B-S 

2.1098 

0.0175 

PPR vs B-S 

2.0564 

0.02 

MLP vs B-S 

3.7818 

0.0001 


Table 21: Paired t-test comparing relative magnitudes of absolute hedging error, using results from all S&P 500 
test sets, i.e., data from July 1987 to December 1991. The degrees of freedom for each test were 1299, although see 
comments in the text concerning dependence. 
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Figure 1: Structure of the learning networks used in this paper. 
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Figure 2: Generalization error E(N,n) for a Gaussian RBF network as a function of the number of data points N 
and the number of network parameters n [reprinted with permission from Niyogi and Girosi (1994)]. 


23 






Figure 4: Simulated call option prices normalized by strike price and plotted versus stock price and time to expiration. 
Points represent daily observations. Note the denser sampling of points close to expiration is due to the CBOE 
strategy of always having options which expire in the current and next month. 


25 









s/x 



0.0 0.2 0.4 0.6 

T 


Figure 6: Input points in the training set and test set for the RBF network with the largest, error measure £. 
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Figure 8: S&P 500 futures and futures options active from July thru December 1989. Dashed line represents futures 
price, while the arrows represent the options on the future. The ^-coordinate of the tip of the arrow indicates the 
strike price [arrows are slanted to make different introduction and expiration dates visible]. 
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Figure 9: July thru December 1989 SVP 500 futures call option prices, normalized by strike price and plotted versus 
stock price and time to expiration. Points represent daily observations. Note the bumpiness of the surface, and the 
irregular sampling away from the money. 
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Figure 10: Black-Scholes parameters estimated from SVP 500 data (see text for details). Values for a fall between 
9.63% and 94.39%, with a median of 16.49%. 
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